view article Article Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques By jmamou and 8 others • Mar 24 • 18
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published Feb 13 • 16
view article Article Assisted Generation: a new direction toward low-latency text generation By joaogante • May 11, 2023 • 64
view article Article Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon By danielkorat and 5 others • Apr 3, 2024 • 11
view article Article Faster Assisted Generation with Dynamic Speculation By jmamou and 6 others • Oct 8, 2024 • 47
view article Article SetFit: Efficient Few-Shot Learning Without Prompts By Unso and 5 others • Sep 26, 2022 • 28
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 38
view article Article Our Transformers Code Agent beats the GAIA benchmark! By m-ric and 1 other • Jul 1, 2024 • 88
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 By tomaarsen • May 28, 2024 • 223
Accelerating Speculative Decoding using Dynamic Speculation Length Paper • 2405.04304 • Published May 7, 2024 • 2
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 19
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon By juliensimon and 8 others • May 9, 2024 • 12
view article Article Introducing the Open Leaderboard for Hebrew LLMs! By Shaltiel and 3 others • May 5, 2024 • 45
Improving Classification Performance With Human Feedback: Label a few, we label the rest Paper • 2401.09555 • Published Jan 17, 2024 • 6
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Paper • 2306.14048 • Published Jun 24, 2023 • 12